Heterogeneous Ensemble for Feature Drifts in Data Streams

نویسندگان

  • Hai-Long Nguyen
  • Yew-Kwong Woon
  • Wee Keong Ng
  • Li Wan
چکیده

The nature of data streams requires classification algorithms to be real-time, efficient, and able to cope with high-dimensional data that are continuously arriving. It is a known fact that in high-dimensional datasets, not all features are critical for training a classifier. To improve the performance of data stream classification, we propose an algorithm called HEFT-Stream (Heterogeneous Ensemble with Feature drifT for Data Streams) that incorporates feature selection into a heterogeneous ensemble to adapt to different types of concept drifts. As an example of the proposed framework, we first modify the FCBF [13] algorithm so that it dynamically update the relevant feature subsets for data streams. Next, a heterogeneous ensemble is constructed based on different online classifiers, including Online Naive Bayes and CVFDT [5]. Empirical results show that our ensemble classifier outperforms state-of-the-art ensemble classifiers (AWE [21] and OnlineBagging [15]) in terms of accuracy, speed, and scalability. The success of HEFT-Stream opens new research directions in understanding the relationship between feature selection techniques and ensemble learning to achieve better classification performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Handling Gradual Concept Drift in Stream Data

Data streams are sequence of data examples that continuously arrive at time-varying and possibly unbound streams. These data streams are potentially huge in size and thus it is impossible to process many data mining techniques (e.g., sensor readings, call records, web page visits). Tachiniques for classification fail to successfully process data streams because of two factors: their overwhelmin...

متن کامل

The Gradual Resampling Ensemble for mining imbalanced data streams with concept drift

Knowledge extraction from data streams has received increasing interest in recent years. However, most of the existing studies assume that the class distribution of data streams is relatively balanced. The reaction of concept drifts is more difficult if a data stream is class imbalanced. Current oversampling methods generally selectively absorb the previously received minority examples into the...

متن کامل

Self-Adaptive Ensemble Classifier for Handling Complex Concept Drift

In increasing number of real world applications, data are presented as streams that may evolve over time and this is known by concept drift. Handling concept drift through ensemble classifiers has received a great interest in last decades. The success of these ensemble methods relies on their diversity. Accordingly, various diversity techniques can be used like block-based data, weighting-data ...

متن کامل

Ensemble Classification and Extended Feature Selection for Credit Card Fraud Detection

Due to the rise of technology, the possibility of fraud in different areas such as banking has been increased. Credit card fraud is a crucial problem in banking and its danger is over increasing. This paper proposes an advanced data mining method, considering both feature selection and decision cost for accuracy enhancement of credit card fraud detection. After selecting the best and most effec...

متن کامل

Dynamic Cost-sensitive Ensemble Classification based on Extreme Learning Machine for Mining Imbalanced Massive Data Streams

In order to lower the classification cost and improve the performance of the classifier, this paper proposes the approach of the dynamic cost-sensitive ensemble classification based on extreme learning machine for imbalanced massive data streams (DCECIMDS). Firstly, this paper gives the method of concept drifts detection by extracting the attributive characters of imbalanced massive data stream...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012